Goto

Collaborating Authors

 development lifecycle


SimCT: A Simple Consistency Test Protocol in LLMs Development Lifecycle

arXiv.org Artificial Intelligence

In this work, we report our efforts to advance the standard operation procedure of developing Large Language Models (LLMs) or LLMs-based systems or services in industry. We introduce the concept of Large Language Model Development Lifecycle (LDLC) and then highlight the importance of consistency test in ensuring the delivery quality. The principled solution of consistency test, however, is usually overlooked by industrial practitioners and not urgent in academia, and current practical solutions are insufficiently rigours and labor-intensive. We thus propose a simple yet effective consistency test protocol, named SimCT. SimCT is mainly to proactively check the consistency across different development stages of "bare metal" LLMs or associated services without accessing the model artifacts, in an attempt to expedite the delivery by reducing the back-and-forth alignment communications among multiple teams involved in different development stages. Specifically, SimCT encompasses response-wise and model-wise tests. We implement the protocol with LightGBM and Student's t-test for two components respectively, and perform extensive experiments to substantiate the effectiveness of SimCT and the involved components.


Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects

arXiv.org Artificial Intelligence

Generative AI technologies promise to transform the product development lifecycle. This study evaluates the efficiency gains, areas for improvement, and emerging challenges of using GitHub Copilot, an AI-powered coding assistant. We identified 15 software development tasks and assessed Copilot's benefits through real-world projects on large proprietary code bases. Our findings indicate significant reductions in developer toil, with up to 50% time saved in code documentation and autocompletion, and 30-40% in repetitive coding tasks, unit test generation, debugging, and pair programming. However, Copilot struggles with complex tasks, large functions, multiple files, and proprietary contexts, particularly with C/C++ code. We project a 33-36% time reduction for coding-related tasks in a cloud-first software development lifecycle. This study aims to quantify productivity improvements, identify underperforming scenarios, examine practical benefits and challenges, investigate performance variations across programming languages, and discuss emerging issues related to code quality, security, and developer experience.


An Empirical Study of Challenges in Machine Learning Asset Management

arXiv.org Artificial Intelligence

In machine learning (ML), efficient asset management, including ML models, datasets, algorithms, and tools, is vital for resource optimization, consistent performance, and a streamlined development lifecycle. This enables quicker iterations, adaptability, reduced development-to-deployment time, and reliable outputs. Despite existing research, a significant knowledge gap remains in operational challenges like model versioning, data traceability, and collaboration, which are crucial for the success of ML projects. Our study aims to address this gap by analyzing 15,065 posts from developer forums and platforms, employing a mixed-method approach to classify inquiries, extract challenges using BERTopic, and identify solutions through open card sorting and BERTopic clustering. We uncover 133 topics related to asset management challenges, grouped into 16 macro-topics, with software dependency, model deployment, and model training being the most discussed. We also find 79 solution topics, categorized under 18 macro-topics, highlighting software dependency, feature development, and file management as key solutions. This research underscores the need for further exploration of identified pain points and the importance of collaborative efforts across academia, industry, and the research community.


Towards Formal Fault Injection for Safety Assessment of Automated Systems

arXiv.org Artificial Intelligence

Reasoning about safety, security, and other dependability attributes of autonomous systems is a challenge that needs to be addressed before the adoption of such systems in day-to-day life. Formal methods is a class of methods that mathematically reason about a system's behavior. Thus, a correctness proof is sufficient to conclude the system's dependability. However, these methods are usually applied to abstract models of the system, which might not fully represent the actual system. Fault injection, on the other hand, is a testing method to evaluate the dependability of systems. However, the amount of testing required to evaluate the system is rather large and often a problem. This vision paper introduces formal fault injection, a fusion of these two techniques throughout the development lifecycle to enhance the dependability of autonomous systems. We advocate for a more cohesive approach by identifying five areas of mutual support between formal methods and fault injection. By forging stronger ties between the two fields, we pave the way for developing safe and dependable autonomous systems. This paper delves into the integration's potential and outlines future research avenues, addressing open challenges along the way.


Assessing AI system performance: thinking beyond models to deployment contexts - Microsoft Research

#artificialintelligence

AI systems are becoming increasingly complex as we move from visionary research to deployable technologies such as self-driving cars, clinical predictive models, and novel accessibility devices. Unlike singular AI models, it is more difficult to assess whether these more complex AI systems are performing consistently and as intended to realize human benefit. How do we know when these more advanced systems are'good enough' for their intended use? When assessing the performance of AI models, we often rely on aggregate performance metrics like percentage of accuracy. But this ignores the many, often human elements, that make up an AI system. Our research on what it takes to build forward-looking, inclusive AI experiences has demonstrated that getting to'good enough' requires multiple performance assessment approaches at different stages of the development lifecycle, based upon realistic data and key user needs (figure 1).


Deploying Machine Learning Models with Heroku

#artificialintelligence

For starters, deployment is the process of integrating a trained machine learning model into a production environment, usually intended to serve an end-user. Deployment is typically the last stage in the development lifecycle of a machine learning product. The "Model Deployment" stage above consists of a series of steps which are shown in the image below: For the purpose of this tutorial, I will use Flask to build the web application. In this section, let's train the machine learning model we intend to deploy. For simplicity and to not divert from the primary objective of this post, I will deploy a linear regression model.


Will Artificial Intelligence Bid Goodbye to Developers in 2022?

#artificialintelligence

Artificial intelligence solutions have entirely transformed the domain of modern enterprises and business operations. Over the years, the further evolution of AI into a more advanced form has changed our outlook on everything. Quite similarly, AI has a drastic impact on software development and testing. Experts are anticipating the increased use of artificial intelligence for software development so that it can increase the efficiency of the software development lifecycle. Currently, maximum software enterprises are adopting emerging AI technologies in software development to stay abreast in the competition.


AWS attendee guide for DevOps and Developer Productivity track at re:Invent2021

#artificialintelligence

AWS re:Invent is a learning conference hosted by Amazon Web Services for the global cloud computing community. We are super excited to join you at the 10th annual re:Invent to share the latest from AWS leaders and discover more ways to learn and build. Let's celebrate this milestone, which will be offered in person in Las Vegas (November 29-December 3) and in virtual (November 29–December 10) formats. The health and safety of our customers, and partners remains our top priority and you can learn more about it in health measures page. If you haven't already registered, don't forget to register and save your spot at your favorite sessions. The AWS DevOps and Developer Productivity track at re:Invent offers you with sessions that are combination of cultural philosophies, practices, and tools that increase an organization's ability to deliver applications and services at high velocity.


Elon is Right, AI is Hard: Five Pitfalls to Avoid in Artificial Intelligence

#artificialintelligence

During the recent Tesla AI Day event, Elon Musk said he discourages "machine learning, because it is really difficult. Unless you have to use machine learning, don't do it." Well, Musk may be right in his assessment, because machine learning is quite difficult to implement. Most companies desire the benefits of what artificial intelligence can achieve for their business, but most don't have what it takes to get it up and running. Therefore, as much as 85% of ML projects currently fail.


Run your TensorFlow job on Amazon SageMaker with a PyCharm IDE

#artificialintelligence

As more machine learning (ML) workloads go into production, many organizations must bring ML workloads to market quickly and increase productivity in the ML model development lifecycle. However, the ML model development lifecycle is significantly different from an application development lifecycle. This is due in part to the amount of experimentation required before finalizing a version of a model. Amazon SageMaker, a fully managed ML service, enables organizations to put ML ideas into production faster and improve data scientist productivity by up to 10 times. Your team can quickly and easily train and tune models, move back and forth between steps to adjust experiments, compare results, and deploy models to production-ready environments.